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The relative log-concavity ordering <i c between probability mass functions (pmf's) on non- 
negative integers is studied. Given three pmf's f,g, h that satisfy / <i c g <i c h, we present a pair 
of (reverse) triangle inequalities: if YJ. ifi — YJ- igi < oo, then 

D(f\h)>D(f\g) + D(g\h) 

and if J2i = J2i < oo, then 

D(h\f)>D(h\g) + D(g\f), 

where D(-\-) denotes the Kullback-Leibler divergence. These inequalities, interesting in them- 
selves, are also applied to several problems, including maximum entropy characterizations of 
Poisson and binomial distributions and the best binomial approximation in relative entropy. We 
also present parallel results for continuous distributions and discuss the behavior of <i c under 
convolution. 

Keywords: Bernoulli sum; binomial approximation; Hoeffding's inequality; maximum entropy; 
minimum entropy; negative binomial approximation; Poisson approximation; relative entropy 

1. Introduction and main result 

A non- negative sequence u = {ui,i > 0} is log-concave if (a) the support of u is an interval 
in Z + = {0, 1, . . .} and (b) uf > for all i or, cquivalently, log(uj) is concave in 

supp(u). Such sequences occur naturally in combinatorics, probability and statistics, for 
example, as probability mass functions (pmf's) of many discrete distributions. Given two 
pmf's / = {/o, /i, . . .} and g = {go, <?i, ■ • ■} on Z+, we say that / is log-concave relative 
to g, written as / <i c g, if 

1. each of / and g is supported on an interval on Z + ; 

2. supp(/) Csupp(g); 

3. log(/i/#j) is concave in supp(/). 



This is an electronic reprint of the original article published by the ISI/BS in Bernoulli, 
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We have / <j c / (assuming interval support) and / <i c g, g <i c h => / <i c h. In other 
words, <i c defines a pre-order among discrete distributions with interval supports on 
Z+. When g is a geometric pmf, / <i c g simply means that / is log-concave; when g is a 
binomial or Poisson pmf and / <i c g, then / is ultra log-concave [23] (see Section 2). 

Whitt [27] discusses this particular ordering and illustrates its usefulness with a queue- 
ing theory example. Yu [30] uses <ic to derive simple conditions that imply other stochas- 
tic orders such as the usual stochastic order, the hazard rate order and the likelihood 
ratio order. Stochastic orders play an important role in diverse areas, including reliability 
theory and survival analysis ([2, 7]); see Shaked and Shanthikumar [24] for a book- length 
treatment. In this paper, we are concerned with entropy relations between distributions 
under <i c . The investigation is motivated by maximum entropy characterizations of bi- 
nomial and Poisson distributions (see Section 2). For a random variable X on Z + with 
pmf /, the Shannon entropy is defined as 

oo 

H(X) = H(f) = -^2f i log(f i ). 

i=0 

By convention, 01og(0) =0. The relative entropy (Kullback and Leibler [19]; Kullback 
[18]; Csiszar and Shields [5]) between pmf's / and g on Z + is defined as 

!oc 
X)/ilog(/i/&), if SU PP(/) C supp(s), 
oo, otherwise. 
By convention, 01og(0/0) = 0. We state our main result. 

Theorem 1. Let f,g, h be pmf's on Z + such that f <i c g <i c h. If f and g have finite 
and equal means, then D(f\h) < oo and 

D(f\h)>D(f\g) + D(g\h); (1.1) 

if h and g have finite and equal means, then 

D(h\f)>D(h\g) + D(g\f). (1.2) 



Theorem 1 has an appealing geometric interpretation. (With a slight abuse of notation, 
we write the mean of a pmf g as E{g) = '^2 i igi.) If g and h satisfy E(g) < oo and g <i c h, 
then (1.1) gives 

D(g\h)=M D(f\h), F = {f: f < lc g,E(f) = E(g)}. 

That is, g is the I-projection of h onto F. Relation (1.2) can be interpreted similarly. See 
Csiszar and Shields [5] for general definitions and properties of the I-projection and the 
related reverse I-projection. 
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While Theorem 1 is interesting in itself, it can also be used to derive several classical 
and new entropy comparison results. We therefore defer its proof to Section 3, after 
considering these applications. We conclude in Section 4 with extensions to continuous 
distributions. Throughout, we also discuss the behavior of <i c under convolution, as this 
becomes relevant in a few places. 

2. Some implications of Theorem 1 

Theorem 1 is used to unify and generalize classical results on maximum entropy charac- 
terizations of Poisson and binomial distributions in Section 2.1 and to determine the best 
binomial approximation to a sum of independent Bernoulli random variables (in relative 
entropy) in Section 2.2. Section 2.3 contains analogous results for the negative binomial. 
Theorem 1 also implies monotonicity (in terms of relative entropy) in certain Poisson 
limit theorems. 

2.1. Maximum entropy properties of binomial and Poisson 
distributions 

Throughout this subsection (and in Section 2.2), let X\, . . . ,X n be independent Bernoulli 
random variables with Pr(JQ = 1) = 1 — Pr(Xi = 0) = Pi, < pt < 1. Define S = X^ILi -^i 

andp= (l/n)E"=ift- 

A theorem of Shepp and Olkin [25] (see also [22] and [10]) states that 

H(S) < H(bi(n,p)), (2.1) 

where bi(n,p) denotes the binomial pmf with n trials and probability p for success. In 
other words, subject to a fixed mean np, the entropy of S is maximized when all pi are 
equal. Karlin and Rinott [16] (see also Harremoes [10]) note the corresponding result 

H(S)<H(po(np)), (2.2) 

where po(A) denotes the Poisson pmf with mean A. 

Johnson [13] gives a generalization of (2.2) to ultra log-concave (ULC) distributions. 
The notion of ultra log-concavity was introduced by Pemantle [23] in the study of negative 
dependence. A pmf / on Z + is ULC of order k if /</(*) is log -concave in i; it is ULC 
of order oo, or simply ULC, if ilfi is log-concave. Equivalently, these definitions can be 
stated with the <i c notation: 

1. / is ULC of order k if f <i c bi(k,p) for some p G (0, 1) (the value of p does not affect 
the definition); 

2. / is ULC of order oo if / <i c po(A) for some A > (the value of A does not affect 
the definition). 
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An example is the distribution of S in (2.2) and (2.1). Denoting the pmf of S by / s , we 
have 

/ S <icbi(n,p), (2.3) 

which can be shown to be a reformulation of Newton's inequalities (Hardy et al. [9]). Also, 
note that, as can be verified using the definition, / being ULC of order k means that it 
is also ULC of orders k + 1, k + 2, . , . , oo. Another notable property of ULC distributions, 
expressed in our notation, is due to Liggett [21]. 

Theorem 2 ([21]). If f <i c bi(k,p) and g <i c bi(m,p),p G (0, 1), then 

f*g<ic bi(k + m,p), 

where f * g = {X)i=o fi9j-u3 = 0,...,k + rn} denotes the convolution of f and g. 

This is a strong result; it implies (2.3) trivially. Simply observe that bi(l,pi) <i c 
bi(l,p), i = 1, . . . , n, and apply Theorem 2 to obtain f s = bi(l,j>i) * ■ • • * bi(l,p„) <i c 
bi(n,p), that is, f s is ULC of order n. A limiting case of Theorem 2 also holds: for pmf's 
/ and g on Z+, we have 

/<icPo(A), #<i c po(/i) => f*g <icPo(A + /i). 

The following generalization of (2.2) is proved by Johnson [13]. 

Theorem 3 ([13]). If a pmf f on Z+ is ULC, then 

H(f)<H(po(E(f))). 

Johnson's proof uses two operations, namely convolution with a Poisson pmf and bi- 
nomial thinning, to construct a semigroup action on the set of ULC distributions with a 
fixed mean. The entropy is then shown to be monotone along this semigroup. A corre- 
sponding generalization of (2.1) appears in Yu [28]. The proof adopts the idea of Johnson 
[13] and is likewise non-trivial. 

Theorem 4 ([28]). If a pmf f is ULC of order n, then 

H(f)<H(U(n,E(f)/n)). 

We point out that Theorems 3 and 4 can be deduced from Theorem 1; in fact, both 
are special cases of the following result. 



Theorem 5. Any log-concave pmf g on Z + is the unique maximizer of entropy in the 
setF = {f: f< lc g,E(f) = E(g)}. 
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Proof. The log-concavity of g ensures that A = E(g) < oo. Letting / £ F and using the 
geometric pmf ge(p) = {p(l — p) 1 , i = 0, 1, . . .}, we get 

D(f\ ge(p)) = -H(f) - log(p) - Alog(l - p), 
D(g\ ge(p)) = -#(<?) - log(p) - Alog(l - p), 

which also shows that H(f) < oo and H(g) < oo. Since / <i c g <\ c ge(p), Theorem 1 
yields 

-H(f)>D(f\g)-H(g)>-H(g) 

so that < -ff(<?) for all /eF, with equality if and only if D(f\g) = 0, that is, 

/ = <?■ □ 

Theorems 3 and 4 are obtained by noting that both po(A) and bi(n,p) are log-concave. 
For recent extensions of Theorems 3 and 4 to compound distributions, see [14] and [31]. 



2.2. Best binomial approximations in relative entropy 

Recall that S = Y17=i is a sum of independent Bernoulli random variables, each with 
success probability pi. Let A = Y17=iPi an d l e t f S denote the pmf of S. Approximating 
S with a Poisson distribution Po(A) is an old problem (Le Cam [20], Chen [3], Barbour 
et al. [1]). Approximating S with a binomial Bi(n,p),p= nas a ^ so been 

considered (Stein [26], Ehm [8]). The results are typically stated in terms of the total 
variation distance, defined for pmf's / and g as V(f,g) = |/, — 9i\- For example, 
Ehm [8] applies the method of Stein and Chen to derive the bound (q = 1 — p) 

n 

V(f S M(n,p)) < (1 ~p n+1 -q n+1 )[{n + l)^^ ~ P? ■ 

i=l 

Here, we are concerned with the following problem: what is the best m, m > n, and 
p€ (0,1) for approximating S with Bi(m,p)? Intuition says Bi(n,p). Indeed, Choi and 
Xia [4] study this in terms of the total variation distance d m = V(f s ,bi(m,X/m)) and 
prove that under certain conditions, for large enough to, d m increases with to. 

Theorem 6 ([4]). Let r = [AJ be the integer part of X and let 5 = X — r. Ifr> 1 + (1 + <5) 2 
and 

to > max{n, X 2 /{r - 1 - (1 + <5) 2 )}, 
then d m < d m+1 < V{f s ,po(X)). 

The derivation of Theorem 6 is somewhat involved. However, if we consider this prob- 
lem in terms of relative entropy rather than total variation, then Theorem 7 below gives 
a definite and equally intuitive answer. Similar results (see Section 2.3) hold for the 
negative binomial approximation of a sum of independent geometric random variables. 
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Theorem 7. Suppose that m! > m > n,p' G (0, 1). Then, 

D(f s \bi(m',p')) > D(f s \bi(m, A/m)) + D(bi(m, A/m)| bi(m',p')) (2.4) 
cmd therefore 

D{f s \U{m',p')) > D(f s \bi(m, A/m)) > £>(/ s | bi(n,p)). 

Proof. Let / = f s ,g = bi(m, A/m) and /i = bi(m',p') in Theorem 1. By (2.3), we have 
/ <ic bi(n,p) <i c g <\ c h. The claim follows from (1.1). □ 

Theorem 7 shows that, for approximating S in the sense of relative entropy, 

1. Bi(m, A/m), which has the same mean as S, is preferable to Bi(m,p'), p' ^ A/m; 

2. Bi(n,p) is preferable to Bi(m, A/m), m> n. 

Obviously, the proof of (2.4) still applies when bi(m',p') is replaced by po(A). Hence, 

D(.f s \ po(A)) > D(f\bi(n,p)) + D(bi(»,P)|po(A)), (2.5) 

that is, Po(A) is worse than Bi(n,p) by at least D(bi(n,p)| po(A)). 

We conclude this subsection with another interesting result in the form of a corollary 
of Theorem 1. Writing b m = bi(m, A/m) for simplicity, we have 

D(b m \ po(A)) > D(6 m |6 m+1 ) + D(b m+1 \ po(A)) 

and, therefore, 

D(6 m |po(A))>-D(6 TO+ i|po(A)) ! m > A. (2.6) 

That is, the limit Bi(m, A/m) — >Po(A),m — > oo, is monotone in relative entropy. As 
simple as (2.6) may seem, it is difficult to derive it directly without Theorem 1, which 
perhaps explains why (2.6) appears new, even though the binomial-to-Poisson limit is 
common knowledge. 

2.3. Analogous results for the negative binomial 

Let T be a sum of geometric random variables, T = X)"=i ^> wnere Y ~ Ge(rj) indepen- 
dently, r, € (0, 1). Denote the mean of T by ^ = ^"_ 1 (1 — r^jri and denote the pmf of 
T by f T . Let nb(n,r) = {( n+ * _1 )r"(l - r)\i = 0, 1, . . .} denote the pmf of the negative 
binomial NB(n,r). 

The counterpart of (2.1) appears in Karlin and Rinott [16]. 

Theorem 8 ([16]). H(T)> H(nb(n,n/(n + (*))). 

In other words, subject to a fixed mean //, the entropy of T is minimized when all ri 
are equal. Theorem 8 can be generalized as follows. 
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Theorem 9. Any log-concave pmf f is the unique minimizer of entropy in the set G = 
{g: f <ic 9 <ic gc(p),E(g) = E(f)},pe (0, 1). 

We realize that Theorem 9 is just a reformulation of Theorem 5, which follows from 
Theorem 1. To show that Theorem 9 indeed implies Theorem 8, we need the following 
inequality of Hardy et al. [9] , written in our notation as 

nb(n,n/(n + fi))< lc f T . (2.7) 

We also need f T to be log-concave, but this holds because convolutions of log-concave 
sequences are also log-concave. 

Next, we consider the problem of selecting the best m,m > n, and r G (0, 1) for ap- 
proximating T with NB(m,r). 

Theorem 10. Suppose m' >m> n and r' £ (0, 1). Write nb m = nb(m, m/(m + /i)) as 
shorthand. Then, 

D(f T | nb(m', r')) > D(f T \ nb m ) + £>(nb m | nb(m', r')) 

and, therefore, 

D(f T \nb(m', r')) > D{f\ nb m ) > D(f\ nb(n,n/(n + /i)))- 
Proof. The relations 

nb(m',r') <i c nb rn <i c nb(n, n/(n + /i)) 
are easy to verify. We also have (2.7). The claim follows from (1.2). □ 

Theorem 10 implies that for approximating T in the sense of relative entropy, 
NB(n, n/(n + fi)) is no worse than NB(?n',r') whenever m! > n. The counterpart of 
(2.5) also holds (nb„ = nb(n, nj (n + /it))): 

D(f T \ po^)) > D(f T \nh n ) + D(nb n \ po( M )), 

that is, Po(/x) is worse than NB(n,n/(n + fi)) by at least D(nb„ |po(/x)). 
In addition, parallel to (2.6), we have 

D(nb m | po(/x)) > £»(nb m / | po(/u)), m' > m > 0, (2.8) 

that is, the limit NB(m, m/ {in + /i)) — > Po(/i), m — > oo, is monotone in relative entropy. 
Note that in (2.8), m and w! need not be integers; similarly in Theorem 10. 

We conclude this subsection with a problem on the behavior of <i c under convolution. 
Analogous to Theorem 2 is the following result of Davenport and Polya ([6], Theorem 
2), rephrased in terms of <i c . 
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Theorem 11 ([6]). Suppose that pmf's f and g on Z+ satisfy nb(fc, r) <i c /, nb(m, r) <\ c 
g for fc, m > 0, r G (0,1). Their convolution f * g then satisfies 

nb(fc + m, r) <i c / * g- 

Actually, Davenport and Polya [6] assume that fc + m = 1, so their conclusion is the 
log-convexity of / * g, but it is readily verified that the same proof works for all positive 
fc and m. The limiting case also holds, that is, 

po(A) <i c /, po(fi) <i c g po(A + n) <i c f*g. 

An open problem is to determine general conditions that ensure 

/<ic/', 9<i c g' =► f*g<\cf'*g'. (2.9) 

Theorem 2 simply says that (2.9) holds if /' = bi(fc,p) and g' = h\{m,p) with the same p 
and Theorem 11 says that (2.9) holds if / = nb(fc,r) and g — nb(m,r) with the same r. 
The proofs of Theorems 11 and 2 (Theorem 2 especially) are non-trivial. It is reasonable 
to ask whether there exist other interesting and non-trivial instances of (2.9). 

3. Proof of Theorem 1 

The proof of Theorem 1 hinges on the following lemma that dates back to Karlin and 
Novikoff [15] and Karlin and Studden [17]. Our assumptions are slightly different from 
those of Karlin and Studden [17], Lemma XL 7.2. In the proof (included for complete- 
ness), the number of sign changes of a sequence is counted discarding zero terms. 

Lemma 1 ([17]). Let aj, i = 0, 1, . . . , be a real sequence such that a « = and 

^™ i x a,; = 0. Suppose that the set C = {i: a,; > 0} is an interval on Z+. For any 
concave function w(i) on Z + , we then have 

oo 

^Tw(i)ai>Q. (3.1) 

i=0 

Proof. Karlin and Studden ([17], Lemma XL 7.2) assume that cti,i = 0,1, ... , changes 
sign exactly twice, with sign sequence —,+,—. However, it also suffices to assume that C 
is an interval. Suppose that a, changes sign exactly once, with sign sequence +, — , that 
is, there exists < fc < oo such that ai > 0, < i < fc, with strict inequality for at least 
one i < fc, and a, < 0, i > fc. Then, 



oo h oo k 

ia% < kai + X] (fc + !K = -y^Qt < o, 
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a contradiction. Similarly, the sign sequence cannot be — , + either. Assuming that C is 
an interval, this shows that, except for the trivial case ai = 0, the sequence ai changes 
sign exactly twice, with sign sequence —,+,—. 

The rest of the argument is well known. We proceed to show that the sequence Aj = 
Yli=o a i nas exactly one sign change, with sign sequence — ,+. Similarly, YjI=q Ai < 
for all j = 0,1,..., which implies (3.1) for every concave function w(i) upon applying 
summation by parts. □ 

Theorem 12 below is a consequence of Lemma 1. Although not phrased as such, the 
basic idea is implicit in Karlin and Studden [17] in their analyses of special cases; see also 
Whitt [27]. When / is the pmf of a sum of n independent Bernoulli random variables 
and g = bi(n,E(f)/n), as discussed in Section 2, Theorem 12 reduces to an inequality of 
Hoeffding [11]. 

Theorem 12. Suppose that two pmf's f and g on Z + satisfy f <\ c g and E(f) = E(g) < 
oo. For any concave function w(i) on Z+, we then have 

oo oo 
i=0 i=0 

Proof. Since E(g) < oo and w is concave, ^2^ giw(i) either converges absolutely or 
diverges to — oo. Assume the former. Since log(fi/gi) is concave and hence unimodal, the 
set C = {i: /, — gi > 0} must be an interval. The result then follows from Lemma 1. □ 

Theorem 1 is a consequence of Theorem 12. Actually, we prove a slightly more general 
"quadrangle inequality," which may be of interest. Theorem 1 corresponds to the special 
case g = g' in Theorem 13. 

Theorem 13. Let f,g,g',h be pmf's on Z + such that f <\ c g <\ c g' <\ c h. If E(f) = 
E{g) < oo, then D(f\h) < oo and 

D(f\h) + D(g\g') > D(f\g>) + D(g\h); (3.2) 

if E(g') = E(h) < oo, then 

D(h\f) + D(g'\g) > D(g'\f) + D(h\g). (3.3) 

Proof. The concavity of \og(fi/hi) and E(f) < oo imply D(f\h) < oo. Likewise for 
D(g\h). Thus, (3.2) can be written as 

D(f\h)-D(f\g')>D(g\h)-D(g\g') 

or, equivalcntly, 

53 fi hgig'Jh) > ^ 9i logig'M. (3.4) 
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Since log(^//ij) is concave in supp(g'), and supp(/) C supp(g) C supp(g'), (3.4) follows 
directly from Theorem 12. 

To prove (3.3), we may assume D(h\f) < oo and D(g'\g) < oo. These imply, in partic- 
ular, that supp(/) = supp(g') = supp(/i). We get 

X]fi , i 1 °g(/i/fl'i) > ^2 hi log(fifgi) 

i i 

and (3.3) follows as before. □ 



4. The continuous case 

For probability density functions (pdf's) / and g with respect to Lebesgue measure on 
R, the differential entropy of / and the relative entropy between / and g are defined, 
respectively, as 

/OO P oo 

-/(a01og(/(;r))da: and D(f\g)= f(x)\og(f(x)/g(x))dx. 
-OO J —CO 

Parallel to the discrete case, let us write / <i c g if 

1. supp(/) and supp(g) are both intervals on R; 

2. supp(/) C supp(g); and 

3. log(f(x)/g(x)) is concave in supp(/). 

There then holds a continuous analog of Theorem 1 (with its first phrase replaced by 
"Let f,g,h be pdf's on R"); the proof is similar and is hence omitted. 
The following maximum/minimum entropy result parallels Theorems 5 and 9. 

Theorem 14. If a pdf g on R is log-concave, then it maximizes the differential 
entropy in the set F = {/: / <i c g,E(f) = E(g)}. Alternatively, if a pdf f on R 
is log-concave, then it minimizes the differential entropy in the set G = {g: f <\ c 
g,g is log-concave and E{g) =£'(/)}. 

Wc illustrate Theorem 14 with a minimum entropy characterization of the gamma 
distribution. This parallels Theorem 8 for the negative binomial. Denote by gam(a, /3) 
the pdf of the gamma distribution Gam(a, /?), that is, 

g&m(x;a,f3)=f3- a x a - 1 e- x/f3 /r(a), x>0. 

Theorem 15. Let on > 1,/Js > and let Xi ~ Gam(ai,l), i — l,...,n, independently. 
Define S = V,!*-! PjXj. Then, subject to a fixed mean ES = QtjPi , the differential 
entropy of S (as a function of f3i,i — 1, . . . , n) is minimized when all Pi are equal. 

Note that Theorem 3.1 of Karlin and Rinott ([16]; see also Yu [29]) implies that The- 
orem 15 holds when all at are equal. We use <i c to give an extension to general cti > 1. 
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A useful result is Lemma 2, which reformulates Theorem 4 of Davenport and Polya [6]. 
As in Theorem 11, Davenport and Polya assume cti + a?, = 1, but the proof works for all 
positive ai,ot2- 

Lemma 2 ([6], Theorem 4). Let a%,a2 > and let f and g be pdf's on (0, oo) such 
that gam(ai, 1) <lc / and gam(a!2, 1) <i c 3- Then, 

gam(ai +a 2 , 1) <i c f*g, 

where (f * g)(x) = f* f(y)g{x - y) dy. 

Proof of Theorem 15. Repeated application of Lemma 2 yields 

gam(a + ,l)< lc / s , (4.1) 

where a+ = Y^i=i a i an d / denotes the pdf of S. Alternatively, we can show (4.1) by 
noting that f s is a mixture of gam(a+,/3), where /3 has the distribution of Xi 
(see, e.g., [27] and [30]). Since on > 1, each Xi is log-concave and so is f s . The claim 
follows from Theorem 14. □ 

Weighted sums of gamma variates, as in Theorem 15, arise naturally in statistical 
contexts, for example, as quadratic forms in normal variables, but their distributions can 
be non-trivial to compute (Imhof [12]). When comparing different gamma distributions 
as convenient approximations, we obtain a result similar to Theorems 7 and 10. The 
proof, also similar, is omitted. 

Theorem 16. Fix ai > 0, Pi > and let Xi ~ Gam(ai, = 1, . . . ,n, independently. 
Define S — X)"=i fiiXi, with pdf f s . Write g a = gam(a, Pi a i/ a ) as shorthand. For 

b > and a' > a> a + , where a + = X)"=i a ij we then have 

D(f s \ gam(a', 6)) > D(f s \g a ) + D(g a \gam(a', b)) 

and, consequently, 

D(f s \gam(a',b))>D(f s \g a )>D(f s \g a+ ). 

In other words, to approximate S in the sense of relative entropy, Gam(a+ , X)"=i Pi a i I a + ) i 
which has the same mean as S, is no worse than Gam(a, b) whenever a > a+. Note that, 
unlike in Theorem 15, we do not require here that a, > 1. 

Overall, there is a remarkable parallel between the continuous and discrete cases. 

Acknowledgments 

The author would like to thank three referees for their constructive comments. 



470 



Y. Yu 



References 

[1] Barbour, A.D., Hoist, L. and Janson, S. (1992). Poisson Approximation. Oxford Studies in 

Probability 2. Oxford: Clarendon Press. MR1163825 
[2] Barlow, R.E. and Proschan, F. (1975). Statistical Theory of Reliability and Life Testing. 

New York: Holt, Rinehart & Winston. MR0438625 
[3] Chen, L.H.Y. (1975). Poisson approximation for dependent trials. Ann. Probab. 3 534-545. 

MR0428387 

[4] Choi, K.P. and Xia, A. (2002). Approximating the number of successes in independent 
trials: Binomial versus Poisson. Ann. Appl. Probab. 12 1139-1148. MR1936586 

[5] Csiszar, I. and Shields, P. (2004). Information theory and statistics: A tutorial. Foundations 
and Trends in Communications and Information Theory 1 417-528. 

[6] Davenport, H. and Polya, G. (1949). On the product of two power series. Canad. J. Math. 
1 1-5. MR0027306 

[7] Dharmadhikari, S. and Joag-Dev, K. (1988). Unimodality, Convexity, and Applications. 
New York: Academic Press. MR0954608 

[8] Ehm, W. (1991). Binomial approximation to the Poisson binomial distribution. Statist. 
Probab. Lett. 11 7-16. MR1093412 

[9] Hardy, G.H., Littlewood, J.E. and Polya, G. (1964). Inequalities. Cambridge, UK: Cam- 
bridge Univ. Press. 

[10] Harremoes, P. (2001). Binomial and Poisson distributions as maximum entropy distribu- 
tions. IEEE Trans. Inform. Theory 47 2039-2041. MR1842536 

[11] Hoeffding, W. (1956). On the distribution of the number of successes in independent trials. 
Ann. Math. Statist. 27 713-721. MR0080391 

[12] Imhof, J. P. (1961). Computing the distribution of quadratic forms in normal variables. 
Biometrika 48 419-426. MR0137199 

[13] Johnson, O. (2007). Log-concavity and the maximum entropy property of the Poisson dis- 
tribution. Stochastic Process. Appl. 117 791-802. MR2327839 

[14] Johnson, O., Kontoyiannis, I. and Madiman, M. (2008). On the entropy and log-concavity 
of compound Poisson measures. Preprint. Available at arXiv:0805.4112. 

[15] Karlin, S. and Novikoff, A. (1963). Generalized convex inequalities. Pacific J. Math. 13 
1251-1279. MR0156927 

[16] Karlin, S. and Rinott, Y. (1981). Entropy inequalities for classes of probability distributions 
I: The univariate case. Adv. in Appl. Probab. 13 93-112. MR0595889 

[17] Karlin, S. and Studden, W.J. (1966). Tchebycheff Systems: With Applications in Analysis 
and Statistics. New York: Interscience. MR0204922 

[18] Kullback, S. (1959). Information Theory and Statistics. New York: Wiley. MR0103557 

[19] Kullback, S. and Leibler, R.A. (1951). On information and sufficiency. Ann. Math. Statist. 

22 79-86. MR0039968 

[20] Le Cam, L. (1960). An approximation theorem for the Poisson binomial distribution. Pacific 

J. Math. 10 1181-1197. MR0142174 
[21] Liggett, T.M. (1997). Ultra logconcave sequences and negative dependence. J. Combin. 

Theory Ser. A 79 315-325. MR1462561 
[22] Mateev, P. (1978). The entropy of the multinomial distribution. Teor. Veroyatn. Primen. 

23 196-198. MR0490451 

[23] Pemantle, R. (2000). Towards a theory of negative dependence. J. Math. Phys. 41 1371- 
1390. MR1757964 



A pair of triangle inequalities 



471 



[24] Shaked, M. and Shanthikumar, J.G. (1994). Stochastic Orders and Their Applications. New 

York: Academic Press. MR1278322 
[25] Shepp, L.A. and Olkin, I. (1981). Entropy of the sum of independent Bernoulli random 

variables and of the multinomial distribution. In Contributions to Probability 201-206. 

New York: Academic Press. MR0618689 
[26] Stein, C. (1986). Approximate Computation of Expectations. IMS Monograph Series 7. 

Hayward, CA: Inst. Math. Statist. MR0882007 
[27] Whitt, W. (1985). Uniform conditional variability ordering of probability distributions. 

J. Appl. Probab. 22 619-633. MR0799285 
[28] Yu, Y. (2008). On the maximum entropy properties of the binomial distribution. IEEE 

Trans. Inform. Theory 54 3351-3353. MR2450793 
[29] Yu, Y. (2008). On an inequality of Karlin and Rinott concerning weighted sums of i.i.d. 

random variables. Adv. in Appl. Probab. 40 1223-1226. MR2488539 
[30] Yu, Y. (2009). Stochastic ordering of exponential family distributions and their mixtures. 

J. Appl. Probab. 46 244-254. MR2508516 
[31] Yu, Y. (2009). On the entropy of compound distributions on nonnegative integers. IEEE 

Trans. Inform. Theory. 55 3645-3650. 

Received March 2008 and revised May 2009 



